[BugFix] Fix frontend multiprocessing hang #7217

maxdebayser · 2024-08-06T19:50:37Z

Now that the OpenAI server has optional multiprocessing, there is a hang that can happen if the backend dies during initialization and never replies to the IS_SERVER_READY message sent in async_engine_client.setup() .

This PR add an optional timeout to _send_one_way_rpc_request so that during the initialization we can check periodically if the server process is still running while we wait for the reply to the IS_SERVER_READY message.

FIX #7213

FYI @njhill @robertgshaw2-neuralmagic

If the server dies, the frontend keeps waiting for it to come up for ever Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

github-actions · 2024-08-06T19:50:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which consists a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of default ones by unblocking the steps in your fast-check build on Buildkite UI.

Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge).

To run full CI, you can do one of these:

Comment /ready on the PR
Add ready label to the PR
Enable auto-merge.

🚀

njhill

Thanks @maxdebayser

vllm/entrypoints/openai/rpc/client.py

vllm/entrypoints/openai/api_server.py

robertgshaw2-neuralmagic · 2024-08-07T00:04:28Z

I wonder if there is a way we can send a health check rather than waiting on timeout. 1000 seconds is 15minutes, this is a very long time to wait before detecting failure. But 1000 seconds is also not that long if we are downloading L405

maxdebayser · 2024-08-07T00:17:30Z

I wonder if there is a way we can send a health check rather than waiting on timeout. 1000 seconds is 15minutes, this is a very long time to wait before detecting failure. But 1000 seconds is also not that long if we are downloading L405

Actually the unit is milliseconds

robertgshaw2-neuralmagic · 2024-08-07T00:19:24Z

I wonder if there is a way we can send a health check rather than waiting on timeout. 1000 seconds is 15minutes, this is a very long time to wait before detecting failure. But 1000 seconds is also not that long if we are downloading L405

Actually the unit is milliseconds

I see - I think this could be a problem b/c download times can be much longer than the timeout here. Im going to post an alternate proposal

^Ignore I see the polling loop now

njhill · 2024-08-07T00:19:41Z

yeah this effectively just polling once per second, maybe add a comment since that confused me too initially… I think some of this can be refactored a bit soon anyhow

robertgshaw2-neuralmagic · 2024-08-07T00:33:38Z

Okay, I see now where the polling is

robertgshaw2-neuralmagic · 2024-08-07T00:33:49Z

Alternatively, we could go with the following:

detect failure #7234

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

…nto fix_multiprocess_hang

vllm/entrypoints/openai/api_server.py

vllm/entrypoints/openai/rpc/client.py

robertgshaw2-neuralmagic · 2024-08-07T00:52:16Z

I think your approach is better than mine. Can you please add a test case for this?

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

maxdebayser · 2024-08-07T02:10:37Z

I think your approach is better than mine. Can you please add a test case for this?

Thanks! I've added a new file for the test, I wasn't sure if there's an existing test file that is a good fit for this.

robertgshaw2-neuralmagic · 2024-08-07T13:10:54Z

Ill wait for nick's final signoff, but LGTM. Thanks for the fix!

njhill

Thanks @maxdebayser

Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>

Fix frontend multiprocessing hang

c7ea323

If the server dies, the frontend keeps waiting for it to come up for ever Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'vllm-project:main' into fix_multiprocess_hang

a296788

maxdebayser mentioned this pull request Aug 6, 2024

updates for vLLM==0.5.4 opendatahub-io/vllm-tgis-adapter#82

Merged

njhill reviewed Aug 6, 2024

View reviewed changes

vllm/entrypoints/openai/rpc/client.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/rpc/client.py Outdated Show resolved Hide resolved

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

njhill changed the title ~~Fix frontend multiprocessing hang~~ [BugFix] Fix frontend multiprocessing hang Aug 6, 2024

maxdebayser added 2 commits August 6, 2024 21:44

address review comments

c120770

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'fix_multiprocess_hang' of github.com:maxdebayser/vllm i…

9aea142

…nto fix_multiprocess_hang

robertgshaw2-neuralmagic reviewed Aug 7, 2024

View reviewed changes

vllm/entrypoints/openai/api_server.py Outdated Show resolved Hide resolved

robertgshaw2-neuralmagic reviewed Aug 7, 2024

View reviewed changes

vllm/entrypoints/openai/rpc/client.py Outdated Show resolved Hide resolved

robertgshaw2-neuralmagic reviewed Aug 7, 2024

View reviewed changes

vllm/entrypoints/openai/rpc/client.py Outdated Show resolved Hide resolved

maxdebayser added 4 commits August 6, 2024 22:03

improve exception message

33e503b

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

add unit to constant name

fbefbe0

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Expand comment about zmq socket options

72c4fb7

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

add unit test

6283c66

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

Merge branch 'vllm-project:main' into fix_multiprocess_hang

093c7a4

robertgshaw2-neuralmagic approved these changes Aug 7, 2024

View reviewed changes

njhill approved these changes Aug 7, 2024

View reviewed changes

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 7, 2024

Merge branch 'main' into fix_multiprocess_hang

a2f2639

robertgshaw2-neuralmagic enabled auto-merge (squash) August 7, 2024 16:28

robertgshaw2-neuralmagic merged commit fde47d3 into vllm-project:main Aug 7, 2024
51 checks passed

maxdebayser deleted the fix_multiprocess_hang branch August 27, 2024 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] Fix frontend multiprocessing hang #7217

[BugFix] Fix frontend multiprocessing hang #7217

maxdebayser commented Aug 6, 2024 •

edited

Loading

github-actions bot commented Aug 6, 2024

njhill left a comment

robertgshaw2-neuralmagic commented Aug 7, 2024

maxdebayser commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024 •

edited

Loading

njhill commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 7, 2024

maxdebayser commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024

njhill left a comment

[BugFix] Fix frontend multiprocessing hang #7217

[BugFix] Fix frontend multiprocessing hang #7217

Conversation

maxdebayser commented Aug 6, 2024 • edited Loading

github-actions bot commented Aug 6, 2024

njhill left a comment

Choose a reason for hiding this comment

robertgshaw2-neuralmagic commented Aug 7, 2024

maxdebayser commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024 • edited Loading

njhill commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024 • edited Loading

robertgshaw2-neuralmagic commented Aug 7, 2024

maxdebayser commented Aug 7, 2024

robertgshaw2-neuralmagic commented Aug 7, 2024

njhill left a comment

Choose a reason for hiding this comment

maxdebayser commented Aug 6, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 7, 2024 •

edited

Loading

robertgshaw2-neuralmagic commented Aug 7, 2024 •

edited

Loading